Goto

Collaborating Authors

 in-house dataset





RadGS-Reg: Registering Spine CT with Biplanar X-rays via Joint 3D Radiative Gaussians Reconstruction and 3D/3D Registration

Shen, Ao, Fu, Xueming, Jiang, Junfeng, Zeng, Qiang, Tang, Ye, Chen, Zhengming, Nong, Luming, Wang, Feng, Zhou, S. Kevin

arXiv.org Artificial Intelligence

Computed Tomography (CT)/X-ray registration in image-guided navigation remains challenging because of its stringent requirements for high accuracy and real-time performance. Traditional "render and compare" methods, relying on iterative projection and comparison, suffer from spatial information loss and domain gap. 3D reconstruction from biplanar X-rays supplements spatial and shape information for 2D/3D registration, but current methods are limited by dense-view requirements and struggles with noisy X-rays. To address these limitations, we introduce RadGS-Reg, a novel framework for vertebral-level CT/X-ray registration through joint 3D Radiative Gaussians (RadGS) reconstruction and 3D/3D registration. Specifically, our biplanar X-rays vertebral RadGS reconstruction module explores learning-based RadGS reconstruction method with a Counterfactual Attention Learning (CAL) mechanism, focusing on vertebral regions in noisy X-rays. Additionally, a patient-specific pre-training strategy progressively adapts the RadGS-Reg from simulated to real data while simultaneously learning vertebral shape prior knowledge. Experiments on in-house datasets demonstrate the state-of-the-art performance for both tasks, surpassing existing methods. The code is available at: https://github.com/shenao1995/RadGS_Reg.


Two-Steps Neural Networks for an Automated Cerebrovascular Landmark Detection

Nader, Rafic, L'Allinec, Vincent, Bourcier, Romain, Autrusseau, Florent

arXiv.org Artificial Intelligence

--Intracranial aneurysms (ICA) commonly occur in specific segments of the Circle of Willis (CoW), primarily, onto thirteen major arterial bifurcations. An accurate detection of these critical landmarks is necessary for a prompt and efficient diagnosis. We introduce a fully automated landmark detection approach for CoW bifurcations using a two-step neural networks process. Initially, an object detection network identifies regions of interest (ROIs) proximal to the landmark locations. Subsequently, a modified U-Net with deep supervision is exploited to accurately locate the bifurcations. This two-step method reduces various problems, such as the missed detections caused by two landmarks being close to each other and having similar visual characteristics, especially when processing the complete MRA Time-of-Flight (TOF). Additionally, it accounts for the anatomical variability of the CoW, which affects the number of detectable landmarks per scan. We assessed the effectiveness of our approach using two cerebral MRA datasets: our In-House dataset which had varying numbers of landmarks, and a public dataset with standardized landmark configuration. Our experimental results demonstrate that our method achieves the highest level of performance on a bifurcation detection task. HE detection of cerebral vascular bifurcations landmarks is important for multiple clinical applications, including enhanced diagnostic precision, surgical planning, and customized therapeutic interventions.


Deep Learning-Based Semantic Segmentation for Real-Time Kidney Imaging and Measurements with Augmented Reality-Assisted Ultrasound

Luijten, Gijs, Scardigno, Roberto Maria, de Paiva, Lisle Faray, Hoyer, Peter, Kleesiek, Jens, Buongiorno, Domenico, Bevilacqua, Vitoantonio, Egger, Jan

arXiv.org Artificial Intelligence

Ultrasound (US) is widely accessible and radiation-free but has a steep learning curve due to its dynamic nature and non-standard imaging planes. Additionally, the constant need to shift focus between the US screen and the patient poses a challenge. To address these issues, we integrate deep learning (DL)-based semantic segmentation for real-time (RT) automated kidney volumetric measurements, which are essential for clinical assessment but are traditionally time-consuming and prone to fatigue. This automation allows clinicians to concentrate on image interpretation rather than manual measurements. Complementing DL, augmented reality (AR) enhances the usability of US by projecting the display directly into the clinician's field of view, improving ergonomics and reducing the cognitive load associated with screen-to-patient transitions. Two AR-DL-assisted US pipelines on HoloLens-2 are proposed: one streams directly via the application programming interface for a wireless setup, while the other supports any US device with video output for broader accessibility. We evaluate RT feasibility and accuracy using the Open Kidney Dataset and open-source segmentation models (nnU-Net, Segmenter, YOLO with MedSAM and LiteMedSAM). Our open-source GitHub pipeline includes model implementations, measurement algorithms, and a Wi-Fi-based streaming solution, enhancing US training and diagnostics, especially in point-of-care settings.


Beyond Glucose-Only Assessment: Advancing Nocturnal Hypoglycemia Prediction in Children with Type 1 Diabetes

Voegeli, Marco, Laguna, Sonia, Leutheuser, Heike, Pfister, Marc, Burckhardt, Marie-Anne, Vogt, Julia E

arXiv.org Artificial Intelligence

The dead-in-bed syndrome describes the sudden and unexplained death of young individuals with Type 1 Diabetes (T1D) without prior long-term complications. One leading hypothesis attributes this phenomenon to nocturnal hypoglycemia (NH), a dangerous drop in blood glucose during sleep. This study aims to improve NH prediction in children with T1D by leveraging physiological data and machine learning (ML) techniques. We analyze an in-house dataset collected from 16 children with T1D, integrating physiological metrics from wearable sensors. We explore predictive performance through feature engineering, model selection, architectures, and oversampling. To address data limitations, we apply transfer learning from a publicly available adult dataset. Our results achieve an AUROC of 0.75 +- 0.21 on the in-house dataset, further improving to 0.78 +- 0.05 with transfer learning. This research moves beyond glucose-only predictions by incorporating physiological parameters, showcasing the potential of ML to enhance NH detection and improve clinical decision-making for pediatric diabetes management.


MRSegmentator: Robust Multi-Modality Segmentation of 40 Classes in MRI and CT Sequences

Häntze, Hartmut, Xu, Lina, Dorfner, Felix J., Donle, Leonhard, Truhn, Daniel, Aerts, Hugo, Prokop, Mathias, van Ginneken, Bram, Hering, Alessa, Adams, Lisa C., Bressem, Keno K.

arXiv.org Artificial Intelligence

Purpose: To introduce a deep learning model capable of multi-organ segmentation in MRI scans, offering a solution to the current limitations in MRI analysis due to challenges in resolution, standardized intensity values, and variability in sequences. Materials and Methods: he model was trained on 1,200 manually annotated MRI scans from the UK Biobank, 221 in-house MRI scans and 1228 CT scans, leveraging cross-modality transfer learning from CT segmentation models. A human-in-the-loop annotation workflow was employed to efficiently create high-quality segmentations. The model's performance was evaluated on NAKO and the AMOS22 dataset containing 600 and 60 MRI examinations. Dice Similarity Coefficient (DSC) and Hausdorff Distance (HD) was used to assess segmentation accuracy. The model will be open sourced. Results: The model showcased high accuracy in segmenting well-defined organs, achieving Dice Similarity Coefficient (DSC) scores of 0.97 for the right and left lungs, and 0.95 for the heart. It also demonstrated robustness in organs like the liver (DSC: 0.96) and kidneys (DSC: 0.95 left, 0.95 right), which present more variability. However, segmentation of smaller and complex structures such as the portal and splenic veins (DSC: 0.54) and adrenal glands (DSC: 0.65 left, 0.61 right) revealed the need for further model optimization. Conclusion: The proposed model is a robust, tool for accurate segmentation of 40 anatomical structures in MRI and CT images. By leveraging cross-modality learning and interactive annotation, the model achieves strong performance and generalizability across diverse datasets, making it a valuable resource for researchers and clinicians. It is open source and can be downloaded from https://github.com/hhaentze/MRSegmentator.


Folding Attention: Memory and Power Optimization for On-Device Transformer-based Streaming Speech Recognition

Li, Yang, Lai, Liangzhen, Shangguan, Yuan, Iandola, Forrest N., Ni, Zhaoheng, Chang, Ernie, Shi, Yangyang, Chandra, Vikas

arXiv.org Artificial Intelligence

Transformer-based models excel in speech recognition. Existing efforts to optimize Transformer inference, typically for long-context applications, center on simplifying attention score calculations. However, streaming speech recognition models usually process a limited number of tokens each time, making attention score calculation less of a bottleneck. Instead, the bottleneck lies in the linear projection layers of multi-head attention and feedforward networks, constituting a substantial portion of the model size and contributing significantly to computation, memory, and power usage. To address this bottleneck, we propose folding attention, a technique targeting these linear layers, significantly reducing model size and improving memory and power efficiency. Experiments on on-device Transformer-based streaming speech recognition models show that folding attention reduces model size (and corresponding memory consumption) by up to 24% and power consumption by up to 23%, all without compromising model accuracy or computation overhead.


Privacy-preserving Early Detection of Epileptic Seizures in Videos

Mehta, Deval, Sivathamboo, Shobi, Simpson, Hugh, Kwan, Patrick, O`Brien, Terence, Ge, Zongyuan

arXiv.org Artificial Intelligence

In this work, we contribute towards the development of video-based epileptic seizure classification by introducing a novel framework (SETR-PKD), which could achieve privacy-preserved early detection of seizures in videos. Specifically, our framework has two significant components - (1) It is built upon optical flow features extracted from the video of a seizure, which encodes the seizure motion semiotics while preserving the privacy of the patient; (2) It utilizes a transformer based progressive knowledge distillation, where the knowledge is gradually distilled from networks trained on a longer portion of video samples to the ones which will operate on shorter portions. Thus, our proposed framework addresses the limitations of the current approaches which compromise the privacy of the patients by directly operating on the RGB video of a seizure as well as impede real-time detection of a seizure by utilizing the full video sample to make a prediction. Our SETR-PKD framework could detect tonic-clonic seizures (TCSs) in a privacy-preserving manner with an accuracy of 83.9% while they are only half-way into their progression. Our data and code is available at https://github.com/DevD1092/seizure-detection